Mini-Project 03 explores New York City’s extensive green spaces, encompassing over 30,000 acres of public parkland across its 51 City Council districts in the five boroughs.
The project emphasizes the responsible acquisition of data from the NYC TreeMap and the NYC Department of Planning, leveraging API access, big data techniques, and geospatial analysis to ensure data integrity and reproducibility.
The analysis integrates multiple spatial data sources to examine tree distribution, species diversity, and overall tree health, while making use of visualization techniques to clearly convey patterns and insights. This approach provides a deeper understanding of the environmental space around us and highlights the community value of New York’s urban forest.
The project also includes a government project design component, using district-level tree data to make an informed hypothetical Parks Department proposal.
II. Data Acquisition and Preparation
Two primary datasets were acquired and prepared: NYC City Council District Boundaries and NYC Tree Points.
City Council District Boundaries: The file was downloaded as a static file from the NYC Department of Planning site. The data was stored locally as a zip file, then unzipped, and read using the st_read function. The data was then transformed to the World Geodetic System (WGS 84) coordinate system to standardize projections and integration with other geospatial data.
NYC Tree Points: The complete NYC TreeMap dataset was obtainted from the NYC OpenData API in GeoJSON format. The data was downloaded iteratively using $limit and $offset parameters to ensure responsible API usage and the results were saved locally to prevent repeated downloads. All files were then combined into a single sf object using bind_rows. Subsetting and caching techniques were used to handle large datasets efficiently.
Code
#Task 1 - Downloading City Council Districts data#Create directoryif(!dir.exists(file.path("data", "mp03"))){dir.create(file.path("data", "mp03"), showWarnings=FALSE, recursive=TRUE)}#Define paths and URLNYC_COUNCIL_ZIP <-file.path("data", "mp03", "nycc_25c.zip")NYC_COUNCIL_URL <-"https://s-media.nyc.gov/agencies/dcp/assets/files/zip/data-tools/bytes/city-council/nycc_25c.zip"#Download ZIP only if file not not existif(!file.exists(NYC_COUNCIL_ZIP)) {download.file(NYC_COUNCIL_URL, destfile = NYC_COUNCIL_ZIP, mode ="wb")message("Downloaded NYC City Council Districts ZIP file.")} else {message("ZIP file already exists; skipping download.")}#Define shapefile pathNYC_COUNCIL_SHP <-file.path("data", "mp03", "nycc.shp")#Unzip file only if shapefile does not existif (!file.exists(NYC_COUNCIL_SHP)) {unzip(NYC_COUNCIL_ZIP, exdir ="data/mp03")message("Unzipped shapefile.")} else {message("Shapefile already exists; skipping unzip.")}#Correct path to shapefileNYC_COUNCIL_SHP <-"data/mp03/nycc_25c/nycc.shp"#Read shapefilecouncil_districts <-st_read("data/mp03/nycc_25c/nycc.shp", quiet =TRUE)#Check first few rows#head(council_districts)#Transform to WGS84council_districts <-st_transform(council_districts, crs ="WGS84")#Simply geometrycouncil_districts <- council_districts |>mutate(geometry =st_simplify(geometry, dTolerance =10))
Code
#Task 2 - Downloading NYC Open Data Forestry Tree Points data#Downloading Tree Points using API#Create folder to store data if it doesn't existif(!dir.exists("data/mp03")) dir.create("data/mp03", recursive =TRUE)#Define API endpoint and file pathsbase_url <-"https://data.cityofnewyork.us/resource/hn5i-inap.geojson"limit <-50000#number of rows per requestoffset <-0#start from the beginningpage <-1#page counter#List to store each pageall_data<-list()#Loop to download all pagesrepeat { file_path <-file.path("data/mp03", paste0("trees_", page, ".geojson"))if(!file.exists(file_path)) {# Build request with limit and offset req <-request(base_url) |>req_url_query(`$limit`= limit, `$offset`= offset)#Perform the request resp <-req_perform(req)#Save raw response to a filewriteBin(resp$body, file_path)message("Downloaded page ", page) } else {message("File already exists: page ", page, "; skipping download") }#Read the downloaded GeoJSON page_data <-st_read(file_path, quiet=TRUE) all_data[[page]] <- page_data# If fewer rows returned than limit, we reached the endif(nrow(page_data) < limit) break#Increment for next page page<- page+1 offset <- offset + limit}#Combine all pages into a single sf objectnyc_trees <-bind_rows(all_data)#read the first page of tree pointsnyc_trees_page1 <-st_read("data/mp03/trees_1.geojson", quiet =TRUE)#Creating a smaller sample for plotting#set.seed(123) # reproducibility#nyc_trees_sample <- nyc_trees_page1 |> slice_sample(n = 500000)
III. Exploring New York City’s Urban Tree Environment
New York City, divided into 51 City Council Districts, is home to over one million trees. As the map below shows, even in a city defined by towering skyscrapers, greenery plays a vital role, weaving pockets of nature throughout the urban landscape.
Code
#Task 3 - Mapping NYC treesggplot()+#Creating a ggplot for Council Districtsgeom_sf(data=council_districts, fill="gray95",color="black",size=6) +#Create ggplot for NYC Treesgeom_sf(data=nyc_trees,color="forestgreen",alpha=0.01,size=0.2) +theme_minimal() +labs(title ="New York City Trees by City Council District",subtitle=paste(scales::comma(nrow(nyc_trees)), "Trees Across 51 Districts"),caption ="Data: NYC Open Data and NYC Planning" ) +theme(plot.title =element_text(hjust =0.5, face ="bold", size =14), # center & bold titleplot.subtitle =element_text(hjust =0.5, size =12), # center subtitleplot.caption =element_text(size =8),plot.margin =margin(10, 10, 10, 10) )
Code
#Task 4 - Creating District-Level joins for analysis of tree coverage#Assigning each tree to the district that contains ittrees_with_district <-st_join( nyc_trees, council_districts,join=st_intersects)#Check first few rows#head(trees_with_district)
Q1. Council City District 51, which covers the South Shore of Staten Island, has the most trees in the city. The district features numerous parks, including Great Kills Park, Blue Heron Park, Wolfe’s Pond Park, Long Pond Park. It is also home to Freshkills Park, currently under development on top of a former landfill. Once completed, Freshkills Park will cover 2,200 acres, making it the largest park created in New York City since the 19th Century.
Code
#Task 4. Q1 - Determine which council district has the most treestrees_by_district <- trees_with_district |>st_set_geometry(NULL) |>group_by(CounDist)|>summarise(num_trees =n())|>arrange(desc(num_trees))#Select top 10 districts by number of treestop_trees <- trees_by_district |>slice_max(num_trees, n=10)#Create bar chartggplot(top_trees, aes(x =reorder(CounDist, -num_trees), y = num_trees)) +geom_col(fill ="forestgreen") +labs(title ="Top 10 NYC Council Districts by Number of Trees",x ="Council District",y ="Number of Trees" ) +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1, size =10),plot.title =element_text(hjust =0.5, face ="bold") ) +scale_y_continuous(labels = scales::comma)
Q2. New York City’s 7th City Council district is relatively small compared to other districts, covering roughly 5.5 km2 of land. Despite its size, it has the highest tree density in the city, with approximately 2.8 trees per hectare. This district covers several small neighborhoods in upper Manhattan, such as Hamilton Heights, Morningside Heights, Manhattanville, and Manhattan Valley. It also includes parts of Washington Heights and the Upper West Side.
Code
#Task 4. Q2 - Determine which council district has the highest density of trees.#Count trees per districttree_counts <- trees_with_district|>group_by(CounDist, Shape_Area) |>summarise(tree_count =n(), .groups ="drop")#Compute tree densitytree_counts <- tree_counts |>mutate(trees_density = tree_count/Shape_Area)#Find the district with the highest densityhighest_density <- tree_counts |>filter(trees_density ==max(trees_density, na.rm =TRUE))highest_density
Q3. The choropleth map below highlights the New York City districts with the highest fractions of dead trees. District 32, covering neighborhoods such as Howard Beach, Ozone Park, and the Rockaways shows the most severe conditions, with 14.5% of its trees classified as dead. Other highly affected districts are concentrated in neighborhoods on Staten Island, other parts of Queens, and Brooklyn, where dead-tree rates exceed 14%.
Code
#Task 4. Q3 - Determine which council district has the highest fraction of dead trees out of all trees.library(sf)library(dplyr)library(tidyverse)library(scales)#Filter out uknown and NA treestrees_clean <- trees_with_district|>filter(!is.na(tpcondition) & tpcondition !="Unknown")#Compute fraction of dead trees per districtdead_tree_fraction <- trees_clean|>st_set_geometry(NULL)|>group_by(CounDist) |>summarise(total_trees =n(),dead_trees =sum(tpcondition =="Dead"),dead_fraction = dead_trees/total_trees )|>arrange(desc(dead_fraction))#Joining fraction back to district geomtries to create choropleth mapdistricts_with_fraction <- council_districts|>left_join(dead_tree_fraction, by ="CounDist")ggplot(districts_with_fraction) +geom_sf(aes(fill = dead_fraction), color ="gray40", size =0.5) +scale_fill_viridis_c(option ="C",direction =-1,trans ="sqrt", # exaggerates small differenceslabels = scales::percent_format(accuracy =1) ) +labs(title ="Fraction of Dead Trees by Council District",fill ="Dead Tree Fraction",caption ="Data: NYC Open Data and NYC Planning" ) +theme_minimal(base_size =12) +theme(plot.title =element_text(hjust =0.5, face ="bold", size =14),plot.caption =element_text(size =8),plot.background =element_rect(fill ="white", color =NA),panel.background =element_rect(fill ="white", color =NA),panel.grid.major =element_line(color ="gray90", size =0.3),panel.grid.minor =element_line(color ="gray95", size =0.2),plot.margin =margin(10, 10, 10, 10) )
Q4. Manhattan is home to a diverse array of tree species across the borough. The chart below highlights the top five species in Manhattan, with the Thornless Honeylocust being the most common. This fast-growing tree tolerates pollution and adapts to various soil types. It thrives in the sun and in the Fall, its leaves turn yellow and often shrivel away, which reduces the need for raking.
Code
#Task 4. Q4 - Determine what is the most common tree species in Manhattan#Adding a borough columntrees_with_district <- trees_with_district |>mutate(Borough =case_when( CounDist >=1& CounDist <=10~"Manhattan", CounDist >=11& CounDist <=18~"Bronx", CounDist >=19& CounDist <=32~"Queens", CounDist >=33& CounDist <=48~"Brooklyn", CounDist >=49& CounDist <=51~"Staten Island",TRUE~NA_character_ ) )#Filter for Manhattanmanhattan_trees <- trees_with_district |>filter(Borough =="Manhattan")#Count tree species and find the most common for Manhattanmost_common_manhattan <- manhattan_trees |>st_set_geometry(NULL) |># remove geometry for speedgroup_by(genusspecies) |>summarise(num_trees =n(), .groups ="drop") |>arrange(desc(num_trees)) |>slice(1) # top species#Filter top 10 species in Manhattantop_species <- manhattan_trees |>st_set_geometry(NULL) |>group_by(genusspecies) |>summarise(num_trees =n(), .groups ="drop") |>arrange(desc(num_trees)) |>slice_head(n =5)# Wrap species names to ~20 characters per linetop_species <- top_species|>mutate(genusspecies_wrapped =str_wrap(genusspecies, width =20))#Plotting the resultsggplot(top_species, aes(x =reorder(genusspecies_wrapped, num_trees), y = num_trees, fill = num_trees)) +geom_col(width =0.7, show.legend =TRUE) +coord_flip() +scale_y_continuous(labels = scales::comma, expand =c(0,0)) +scale_fill_gradient(low ="lightgreen",high ="darkgreen",labels = scales::comma # add commas to legend labels ) +labs(title ="Manhattan's 5 Most Common Tree Species",x =NULL,y ="Number of Trees",fill ="Number of Trees",caption ="Data: NYC Open Data and NYC Planning" ) +theme_minimal(base_size =13) +theme(plot.title =element_text(hjust =0.5, face ="bold", size =14),plot.caption =element_text(size =8, hjust =1),axis.text.y =element_text(size =9),axis.text.x =element_text(size =9),axis.title.x =element_text(size =10), legend.title =element_text(size =10), panel.grid.major =element_blank(), # remove all major gridspanel.grid.minor =element_blank(), # remove all minor gridsplot.margin =margin(t =10, r =10, b =10, l =60) )
Q5. Since the Thornless Honeylocust is the most common tree species in Manhattan, it is no surprise that it would be the closest tree to Baruch College. This species is widely used in urban and suburban landscaping, often planted along streets and in parking lots due to its remarkable adaptability to urban stress.
Code
#Task 4. Q5 - Determine the tree species closest to Baruch's campus#Creating Baruch point - coordinate lat=40.7394, lon=-73.9833baruch_point <-st_sfc(st_point(c(-73.9833, 40.7394)), crs =4326)# Project to NY State Plane (feet)trees_proj <-st_transform(trees_with_district, 2263)baruch_proj <-st_transform(baruch_point, 2263)# Compute distances in feettrees_proj <- trees_proj |>mutate(distance_to_baruch_ft =as.numeric(st_distance(geometry, baruch_proj)))# Find the closest treeclosest_tree <- trees_proj |>arrange(distance_to_baruch_ft) |>slice(1)closest_tree$genusspecies closest_tree$distance_to_baruch_ft
IV. Government Project Design
New York City Council District 1 encompasses several diverse and historically significant neighborhoods in Lower Manhattan, including the Financial District, Battery Park City, Chinatown, Tribeca, SoHo, and the Lower East Side. The district has experienced rapid development and land-use changes in recent years. The city’s rezoning efforts for new affordable housing has also placed pressure on existing green spaces. The latest debate on the future of the Elizabeth Street Garden highlights community concerns about the loss of accessible, high-quality green space. These changes underscores the urgent need to protect and expand the district’s urban forest.
To preserve environmental quality, improve public health, and prevent further loss of green areas, this proposal establishes the District 1 Tree Restoration and Expansion Initiative. This plan aims to strengthen the district’s urban forestry by improving tree health, expanding tree coverage, and strengthening the district’s green infrastructure.
This initiative focuses on strengthening the district’s urban forest by:
Replacing 500 unhealthy trees currently classified as “poor”, “critical”, or “dead”.
Planting 1000 new trees to expand coverage, increase the district’s biodiversity, and improve air quality.
Enhancing maintenance of tree health through regular pruning and watering, and targeted care for trees in unhealthy condition.
Promoting community engagement through monthly tree-related educational and recreational events to build environmental awareness.
New York City’s District 1 envisions a greener, more resilient landscape, where all residents have equitable access to high-quality green space. It aims to create an urban refuge that supports biodiversity, improves quality of life, and provides relief from the intensity of city living.
The following analysis provides the foundation for the District 1 Tree Restoration and Expansion Initiative, supporting the targeted needs in tree replacement, planting and maintenance.
TREE POPULATION SUMMARY District 1 is home to 12,268 trees, placing it near the middle of the distribution among other Manhattan districts. Although it has more trees than districts with the fewest trees, such as District 5, which spans New York’s Upper East Side, Roosevelt Island, and a small part of East Harlem, its tree density is lower than some of the other higher-ranking districts.
Code
#How many trees in D1?trees_in_d1 <- trees_with_district |>filter(CounDist ==1) |>nrow()trees_in_d1#Count trees by district on for Manhattantrees_manhattan_counts <- trees_with_district|>filter(Borough=="Manhattan")|>st_set_geometry(NULL)|>count(CounDist, name ="num_trees")|>arrange(desc(num_trees))trees_manhattan_countstrees_manhattan_counts <- trees_manhattan_counts %>%mutate(highlight =ifelse(CounDist ==1, "District 1", "Other Districts"),CounDist =factor(CounDist) # make it a factor for plotting )# Plotggplot(trees_manhattan_counts,aes(x =reorder(CounDist, num_trees),y = num_trees,fill = highlight)) +geom_col() +coord_flip() +scale_y_continuous(labels = scales::comma) +scale_fill_manual(values =c("District 1"="darkgreen","Other Districts"="lightgreen")) +labs(title ="Number of Street Trees by Manhattan Council District",x ="Council District",y ="Number of Trees",fill ="",caption ="Data: NYC Open Data and NYC Planning" ) +theme_minimal() +theme(plot.title =element_text(hjust =0.5, face ="bold", size =14),axis.text.x =element_text(size =8), axis.text.y =element_text(size =8), legend.position ="none",panel.grid.major =element_blank(),panel.grid.minor =element_blank() )
TREE DENSITY AND DISTRIBUTION Compared to other Manhattan districts, District 1 is the third largest area, covering nearly 7.23 km2 of land. Despite its size, its tree density is among the lowest in the borough, with less than 2 trees per hectare. This highlights the pressing need to expand green space and increase tree coverage in the district.
Code
#Calculate tree density for more accurate comparison of number of trees per unit area#Calculate land area for all districts in Manhattanmanhattan_districts_area <- council_districts |>filter(CounDist >=1& CounDist <=10) |>mutate(area_sqm =st_area(geometry), area_km2 =as.numeric(area_sqm) /1e6 ) |>select(CounDist, area_sqm, area_km2)|>arrange(desc(area_km2))#manhattan_districts_area# Filter for Manhattanmanhattan_trees <- trees_with_district |>filter(Borough =="Manhattan")# Count tree density per districttrees_manhattan_density <- manhattan_trees |>st_set_geometry(NULL) |># remove geometry for faster processinggroup_by(CounDist, Shape_Area) |>summarise(num_trees =n(), .groups ="drop") |>mutate(tree_density_per_sqm = num_trees / Shape_Area, # trees per m^2tree_density_per_hectare = tree_density_per_sqm *10000) |># trees per hectarearrange(desc(tree_density_per_hectare)) # optional: sort by density# View the resultstrees_manhattan_density#Create visual plot of tree density in Manhattanmanhattan_districts <- council_districts |>filter(CounDist >=1& CounDist <=10)# Compute number of trees per districttrees_manhattan <- trees_with_district |>filter(Borough =="Manhattan") |>st_set_geometry(NULL) |>group_by(CounDist) |>summarise(num_trees =n(), .groups ="drop")#Convert both to integermanhattan_districts <- manhattan_districts |>mutate(CounDist =as.integer(CounDist))trees_manhattan_counts <- trees_manhattan_counts |>mutate(CounDist =as.integer(CounDist))# Combine tree counts with district geometriesmanhattan_districts <- manhattan_districts |>left_join(trees_manhattan_counts, by ="CounDist") |>mutate(tree_density_per_sqm = num_trees / Shape_Area,tree_density_per_hectare = tree_density_per_sqm *10000 )# Plot choropleth mapggplot(manhattan_districts) +geom_sf(aes(fill = tree_density_per_hectare), color ="black", size =0.3) +geom_sf_text(aes(label = CounDist), size =4, color ="white") +scale_fill_viridis(option ="J",direction =-1,name ="Trees per hectare" ) +labs(title ="Tree Density by Manhattan City Council District",subtitle ="Number of trees per hectare of total district area",caption ="Data: NYC Open Data and NYC Planning" ) +theme_minimal(base_size =12) +theme(plot.title =element_text(hjust =0.5, face ="bold", size =14),plot.subtitle =element_text(hjust =0.5, size =12),plot.caption =element_text(size =8, hjust =1),plot.margin =margin(20, 20, 20, 20),panel.grid.major =element_blank(),panel.grid.minor =element_blank(),panel.background =element_blank(),plot.background =element_blank(),axis.text =element_blank(), # remove axis textaxis.ticks =element_blank(), # remove ticksaxis.title =element_blank() # remove axis titles )
TREE HEALTH PROFILE The tree health in District 1 is of significant concern. Over 15% of its trees are classified as being in “poor”, “critical”, or “dead” condition, making it the third highest district in Manhattan with unhealthy trees. Although District 10 has the highest percentage (18%) of trees in poor condition, it also has a substantially larger total number of trees. In contrast, District 1’s combination of lower tree density coupled with a relatively high proportion of unhealthy trees highlights the urgency for targeted care, maintenance, and other efforts to improve the health of the district’s urban forest.
Code
#Define condition order from best to worsttp_levels <-c("Excellent", "Good", "Fair", "Poor", "Critical", "Dead")#Filter out unknown and NA conditions and prepare data in ordered factortrees_health_manhattan <- manhattan_trees |>filter(!is.na(tpcondition) & tpcondition !="Unknown") |>st_set_geometry(NULL) |>mutate(tpcondition =factor(tpcondition, levels = tp_levels, ordered =TRUE),highlight =if_else(CounDist ==1, "District 1", "Other Districts")) |>group_by(CounDist, tpcondition) |>summarise(num_trees =n(), .groups ="drop")# Plot stacked bar chart for all tpconditionggplot(trees_health_manhattan, aes(x =factor(CounDist), y = num_trees, fill = tpcondition)) +geom_bar(stat ="identity") +scale_fill_brewer(palette ="Greens") +labs(title ="Tree Health by Manhattan City Council District",x ="Manhattan Council District",y ="Number of Trees",fill ="Tree Condition",caption ="Data: NYC Open Data and NYC Planning" ) +theme_minimal() +theme(plot.title =element_text(hjust =0.5, face ="bold", size =14),axis.text.x =element_text(size =8),legend.position ="right",panel.grid.major =element_blank(), panel.grid.minor =element_blank(), )
Code
#% of trees that are poor, critical, dead in manhattan districtsmanhattan_health_pct <- trees_with_district |>filter(Borough =="Manhattan") |>group_by(CounDist) |>summarise(num_poor_critical_dead =sum(tpcondition %in%c("Poor", "Critical", "Dead"), na.rm =TRUE),total_trees =n(),pct_poor_critical_dead = (num_poor_critical_dead / total_trees) *100,.groups ="drop" ) |>arrange(desc(pct_poor_critical_dead))manhattan_health_pct